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Abstract 


The spatial sign correlation (Diirre, Vogel and Fried, 2015) is a highly robust and easy-to-compute, 
bivariate correlation estimator based on the spatial sign covariance matrix. Since the estimator 
is inefficient when the marginal scales strongly differ, a two-stage version was proposed. In the 
first step, the observations are marginally standardized by means of a robust scale estimator, and 
in the second step, the spatial sign correlation of the thus transformed data set is computed. 


Diirre et al. (2015) give some evidence that the asymptotic distribution of the two-stage estimator 


equals that of the spatial sign correlation at equal marginal scales by comparing their influence 
functions and presenting simulation results, but give no formal proof. In the present paper, we 
close this gap and establish the asymptotic normality of the two-stage spatial sign correlation 
and compute its asymptotic variance for elliptical population distributions. We further derive a 
variance-stabilizing transformation, similar to Fisher’s z-transform, and numerically compare the 
small-sample coverage probabilities of several confidence intervals. 
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1. Introduction 


The spatial sign of x G IR^' is defined as s{x) = x/|x| for x 7 ^ 0 and s(0) = 0, where | • | denotes 
the Euclidean norm in IR^. For a p-dimensional random variable X with distribution F and t G IR^, 
p > 2 , we call 

S{F, t)=E (s(X - t)s(X - tf) 

the spatial sign covariance matrix (SSCM) of the distribution F with location t. Furthermore, 
letting tn be an estimator for t and = {X\, ..., Xn)^ an n x p array, where Xi ,..., Xn is a 
random sample from the distribution F, we call 

1 ” 

Sn = Sn{Xn, tn) = " V s{X, - tn)s{Xi - tnf (1) 

n 

i=\ 


the empirical spatial sign covariance matrix with location t^- The term spatial sign covariance 
matrix was coined by Visuri, Koivunen and Oja (2000). Diirre, Vogel and Tyler (2014) showed 
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consistency and asymptotic normality of Sn under mild conditions on F and tn- The estimator 
Sn has excellent robustness properties. Its influence function is bounded, and its asymptotic 


breakdown point attains the optimal value of 1/2 (Croux et ah, 2010). 

We will assume below that Xi,X 2 , 

Lebesgue density / of the form 


follow a continuous elliptical distribution, i.e., F has a 


f{x) = det{V) 2 g{{x-^i) V (x-/j)) 

for a location parameter /j G IR^ and a symmetric, positive definite shape matrix V G IR^^^. We 
denote the class of continuous elliptical distributions with these parameters by V). The matrix 
V is called the shape matrix of F since it describes the shape of the elliptical contour lines of the 
density. The function g : [0, oo) —>■ [0, oo) is called the elliptical generator of F. The specification 
of V is unique only up to a multiplicative constant, and therefore V is often normalized, e.g., by 


setting det(y) = 1 (Paindaveine, 2008, Frahm, 2009). Since we study scale-free aspects of F, where 
the overall scale is irrelevant, it is more convenient to not fix the scale of the shape. We understand 
the shape of an elliptical distributions as an equivalence class of positive dehnite matrices being 
proportional to each other. 

There is, up to scale, a one-to-one connection between S{F,g) and the parameter V: both 
share the same eigenvectors and the ordering of the respective eigenvalues. This makes the spatial 


sign covariance matrix particularly popular for robust principle component analysis (e.g. Marden 


1999 Locantore et al., 1999; Croux et ah, 2002; Gervini, 2008). However, the map between the 


eigenvalues of V and S{F,fi) is only known explicitly for p = 2 (Croux et ah, 2010, Vogel et al. 


2008). Making use of this result, Diirre et al. (2015) proposed a robust correlation estimator, called 


the spatial sign correlation. Let 


•Sn si2 
•521 S22 


= Sn{^n,tn) 


denote the entries of 5'n(X„,t„). Then the generalized correlation coefficienl|^ p = Vl 2 /^/VllV 22 can 
be estimated by 


Pn — 


CS126 




( 2 ) 


where 


b = d-hi, ^ - 1/2)^ + s?2- 


For a derivation of this estimator, see Diirre et al. (2015). There it is also shown that the spatial 


sign correlation p„ is consistent for p under ellipticity and asymptotically normal with asymptotic 


variance 


ASV{M = P^f + \{a + a ^){l - 


( 3 ) 


*We call p — V 12 Iy/v-i\V 22 the generalized correlation coejficient of the bivariate elliptical distribution F. It 
is dehned without any moments assumptions and coincides with the usual product moment correlation if second 
moments are hnite. 
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where a = y/viifv 22 - The asymptotic variance apparently is minimal for a = 1, i.e., for equal 
marginal scales, but it can get arbitrarily large as a approaches oo or 0. Since our aim is to 
estimate the correlation coefficient, which is invariant under marginal scale changes, it is therefore 
reasonable to standardize the data marginally before computing the spatial sign correlation. 

Let cj(-) denote a univariate scale measure or dispersion measure, i.e., for any univariate distri¬ 
bution G it satishes 


cr{G*^ p) = \a\ (y{G) for all a,/3 G IR, 


(4) 


where G* ^ is the distribution of Y* = oY -\- f3 for Y ^ G. This may be the standard deviation 
asD = {E{Y — but since the main purpose of studying spatial sign methods is their 

robustness, robust measures like the median absolute deviation umad = median|y — median(y)| or 


the Qn scale measure = qin{\Y — Y'\) (Rousseeuw and Croux, 1993) may be more appropriate. 
Here, Y' is an independent copy of Y, and median(y) denotes the median of the distribution of 
y and qi/iiY) its l/4th quantile. Let further dn = denote the respective seale estimator, 

which is, in principle, the measure (t(-) applied to the empirical distribution associated with the 
univariate sample = {Yi,... ,Yn)^. But in many situations, the empirical version of the scale 
measure is defined slightly differently due to various reasons, e.g., the empirical standard deviation 
is usually defined as d„(Y„) — {(n - 1)“^ ~ instead of d.,(Y„) - - 

Returning to the general p-dimensional set-up, for any specihc choice of cr(-), let Fi denote 
the ith margin of F, further cjj = cr(Fj) and = (T„(Xn^), where ' 

1 < i < p. Let 


is the ith column of 




0 ^ 



0 \ 

A = 

1 0 


) An = 

1 0 

^p,n/ 


Then we dehne the two-stage spatial sign covariance matrix as 


1 -v ^ 

S{Xn,tn{-),An) = Sn(XnAn, tn(-)) = - s(AnXi - tn(XnAn))s(AnXi - tn(XnAn))'^, (5) 

i=\ 


and the two-stage spatial sign eorrelation (of the sample Xn with location tn{-) and inverse 
scales An) as the spatial sign correlation pn, cf. ([^, being applied to Sn(Xn,tni-), An) instead of 
Sn{Xn,tn). 

Remark 1. 

(I) There is a subtle but important difference in the role that tn plays in 0 and in ([^. When 
defining Sn(Xn, tn), the location tn may generally be any random vector, which may or may 
not bear a connection to the sample Xn- But usually, we take it to be an estimator computed 
from the data, i.e., it is a function of Xn- Whenever we want to invoke this latter meaning, 
we write tn{-) instead of tn, particularly so in the definition of SniXn,tn{-), An). Here it is 
essential that tn{-) is applied to the transformed data XnAn- This will become important at 
a later point when we consider different location estimates for the transformed data, cf. e.g. 
Condition Cl5]of Theorem [T] below. 
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(II) In the definition of the two-stage spatial sign covariance matrix the data 

are first standardized marginally, and then the location is estimated from the transformed 
data. For all marginally equivariant location estimators - and this is vast majority - the 
order of these two-steps is irrelevant. We call a multivariate location estimator tn marginally 
equivariant if it satisfies = Atn{^n)+b for any pxp diagonal matrix A and b G IR,^. 

All location estimators being composed of univariate, affine equivariant location estimators 
are marginally equivariant, but so are also all multivariate, affine equivariant location esti¬ 
mators, including elliptical maximum likelihood estimators, M-estimators (Maronna| 1976 


Tyler, 1987), S'-estimators ( ]Davies 1987), or constrained M-estimators ([Kent and Tylerj 


1996). However, there is one prominent example which lacks this property: the spatia 


median (e.g. Oja, 2010, Section 6.2). We want to include this estimator since, due to its 
conceptual similarity to the SSCM, it may be regarded as a default choice for tn- The spatial 
median has a variety of good properties such as uniqueness and computational and statistical 
efficiency, see e.g. Magyar and Tyler ( 2011| and the references therein. Likewise to the spa¬ 
tial sign covariance matrix, the spatial median is inefficient at strongly shaped distributions. 
Thus, when using the spatial median as location estimate, it is therefore, from a conceptual 
point of view, reasonable to compute it from the marginally standardized data. This is the 
reason for choosing the order of steps as we do here: first standardization, then location 
estimation. However, in practical situations, the difference to the estimator obtained when 
reversing the order of these two steps tends to be rather small - also in case of the spatial 
median. 


(HI) Finally we would like stress that we deliberately avoid any reference to the covariance ma¬ 
trix of F. Our whole discussion of scale and correlation is completely moment-free. We 
understand correlation generally as monotone dependence, with the moment-based Pearson 
correlation coefficient being one, and with no doubt the most popular, way of mathematically 
quantifying this notion. Our main focus here is on estimating the generalized correlation 
coefficient p within the semiparametric model of elliptical distributions, but the concept of 
spatial sign correlation can also be employed for defining a general, moment-free measure of 
correlation. Requiring no moment assumptions is one major strength of spatial sign methods. 


Following the introduction, the article has two further sections: Section 2 Asymptotic results 
and Section 3 Simulations. The main result of the paper (Theorem states that, at elliptical 
distributions, the asymptotic variance of the two-stage spatial sign correlation is 

A5F(p.,„) = (l-p2)2 + (l_p2)3/2^ 


which is shown by establishing the asymptotic equivalence of p, 
at distributions with equal marginal scale. This was conjectured by Diirre et al. (2015) 


to the spatial sign correlation 

who 


compare the corresponding influence functions. Towards this end, we investigate the asymptotics 
of the two-stage spatial sign covariance matrix (Theorem [^. With the asymptotic distribution of 
Pa,n taking on a rather simple form, only depending on p, one can derive a variance-stabilizing 
transformation analogous to Fisher’s z-transform. This is the content of Corollary In Section 
3, we numerically compare confidence intervals for p based on the moment correlation and the 
spatial sign correlation, both with and without variance-stabilizing transformation. All proofs are 
deferred to the Appendix. 
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2. Asymptotic results 

The first result concerns the asymptotic difference between the sample two- 

stage SSCM with estimated location and scales, and S'„(X„A,At), the sample two-stage SSCM 
with known location and scales. We use the notation to denote the jth component of the 
p-dimensional random vector X, j = 1,... ,p, likewise for other vectors. 

Theorem 1. Let t G and X be a p-variate random vector with continuous distribution F 
satisfying 

(Cl) E\X -t\-^/^ < oo, 

(C2) =0 and | =o for i,j,k = l,...,p. 

Let further A be a p x p diagonal matrix with positive diagonal entries ai,..., Op, and a series 
of random p x p diagonal matrices satisfying 

(C3) ^{An - A) XL^Z = diag(Zi, ...,Zp) 

for some random diagonal matrix Z. Finally, let = (Ai,..., be an iid sample drawn from 

F and tn{-) a series of p-variate estimators satisfying 

(C4) V^{tn{Xn)-t}=Op{l), 

(C5) y/n{tn{^nAn) “ = Op{l). 


Then ^/n{Sn{^nAn,tni^)) — Sni'Z.nA, At)} —^ Hp as n ^ oo with 

p 

Ep = A-^ZS{Fo, 0 ) + S{Fo, 0)A-^Z - 2 ( 6 ) 


where Fq is the distribution of Xq = A{X — t) and 


rj = E 



XqX^ - 
{X^Xo]\ • 


Theoremapparently has a long list of technical conditions. They are due to the fact that it is 
formulated under very broad conditions. We do not assume any specific model for the distribution 
F. Also, the location estimator tn{-), the scale estimator and even the location t are unspecified. 
The above conditions are indeed a set of easy-to-verify regularity conditions, which are met in all 
relevant situations, and many of which may be further relaxed for the price of more involved 
technical derivations. We will review them one by one below. 


Condition (CQ requires the probability mass of F to be not too strongly concentrated around 
t. For instance, if F possesses a Lebesgue density /, it is sufficient (but not necessary) that 
/ is bounded at t. This condition also appears in Theorems 2 and 3 of Diirre et al.| (2014) 
and is, loosely speaking, due to the discontinuity of the spatial sign function at the origin. 
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Condition (C[^ is indeed a somewhat restrictive condition as it basically imposes component¬ 
wise symmetry of F around t. It is, however, a mere convenience assumption, it can be 
dropped in favor of an additional term in Q and a slightly stronger formulation of the other 
conditions (basically joint convergence of Sn, tn and An). The proof of the more general 
version runs analogously, with the main difference that |Durre et al.| (2014 Theorem 3) instead 
of Diirre et al. (2014, Theorem 2) would be used. However, our central result. Theorem 
below, concerns elliptical distributions, for which (CQ is fulfilled. We therefore consider it 
appropriate to include this symmetry condition here for the sake of simpler conditions and a 
clearer exposition. 

Condition (C[^ is satisfied, e.g., if is taken to be the diagonal of some p x p scatter matrix 
estimator for which asymptotic normality has been shown. But also if A~^ is composed of 
univariate scale estimators (the default case here due to computational reasonability), it is 
usually true. Specifically, if the univariate scale estimator allows a linearization, i.e.. 


^j,n — 

n 


1 " 


3 = 


(7) 


2=1 


with £'{/,■ (X('?'))2} < oo, then \/n{(dp„,... - (ui,... ,crp)}^ = ^/ndiag{An^ - ^ 

converges to a multivariate normal distribution, and then so does y/n[An — A). Note that, 
since A and An are diagonal matrices, — A~^) —^ Z implies ^/n{An — A) = 

AAny/n{A~^ — An^) —^ —A^Z, and hence Z = —A^Z in distribution. 

All estimators of practical relevance allow a linearization ([^. For instance, for quantile-based 
estimators, such as the MAD, this linearization is provided by the Bahadur representation 


Gini’s mean difference, it is given by the Hoeffding decomposition ( 

Hoeffding 

1 

948 

), and in 

the case of f7-quantiles, such as the Qn scale estimator ( 

Rousseeuw and Croux 

1993 

), by a 


combination of the two (Serfling, 1984 Wendler, 2011) 


Condition (CQ: This is a minimal standard assumption. 

Condition (C[^ is trivially fulfilled for any marginally equivariant location estimator, see Remark 
00. Primarily, this condition is necessary because we want to include the spatial median 
as potential location estimator, and, for efficiency reasons, propose to standardize the data 
prior to computing its spatial median (instead of scaling the spatial median along with the 
data). Under (CQ, the spatial median satisfies (CQ at elliptical distributions (Nevalainen 
et al., 2007). 


Finally, the continuity of F also is a mere convenience assumption, which prohibits that several data 
points coincide with each other, and thus ensures that tn coincides with at most one observation. 


Alternative assumptions are discussed also in Diirre et al. (2014). 


In case of F being an elliptical distribution and t its symmetry center, explicit expressions for 
S{F,t) appear to be known only for p = 2. In this case, Hp in ([^ considerably simplifies. 

Corollary 1. Let p = 2 and A ~ T G S’ 2 {t,V). Let A = diag(ai,a 2 ) be a 2 x 2 diagonal 
matrix with positive diagonal entries such that Vq = AVA has equal diagonal entries. Let further 
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Z = diag(Zi, Z 2 ) he a random 2x2 diagonal matrix. 


Then H 2 from Theorem is 


(Zijai — ^ 2/02 

V 0 


0 ^ 

^2/02 — Z \ la \) 


c, 


where C = (1 — -\/l — p^)l(2f?) i/ p / 0 and ( = l/i if p = 0, and p = Ui 2 (?^ii'y 22 ) 

An important implication of Corollary is that, at elliptical population distributions, the 
asymptotic distribution of the off-diagonal element of the two-dimensional two-stage SSCM is the 
same as that of the off-diagonal element of the ordinary SSCM at the corresponding distribution 
with equal marginal scales. Building on this observation, we can derive the asymptotic distribution 
of the two-stage spatial sign correlation by means of a generalized version of the delta method. 

Theorem 2. Let p = 2 and X F ^ S' 2 {t, V) satisfy Condition ^ of Theorem^ Let A, 
and tn{-) he as in Theorem^ satisfying Conditions (§ eg and with the further property that 
Vo = AVA has equal diagonal entries. Then 

Vn{pa,n - p) ^ iV(0, (1 - p^f + (1 - ■ (8) 


We have the following remarks about Theorem 

Remark 2. 


(I) Comparing Q to Q, we find that, at any elliptical distribution, the spatial sign correlation 
with the margins being standardized beforehand by the true scales and the spatial sign cor¬ 
relation with the margins being standardized by estimated scales have the same asymptotic 
efficiency. In fact, we show in the Appendix that they are asymptotically equivalent. In other 
words, the loss for not knowing the scale is nil asymptotically, and this is true regardless of 
the scale estimator used. Any scale function cr(-) satisfying Q yields that that Xq = AX 
has equal marginal scales if X is elliptical. Also, the finite-sample variances of the spatial 
sign correlation with known and estimated scales hardly differ, as the simulations in [Dfirre 


et al. (2015) indicate. 


(II) At elliptical distributions with finite fourth moments, the asymptotic variance of the product 
moment correlation is (1 -|-k/ 3)(1 —where k is the marginal excess kurtosis. Thus under 
normality, where k = 0 , the additional term (1 — may be viewed as the price to pay 

efficiency-wise for the gain in robustness when using the spatial sign correlation instead of 
the moment correlation. 


(Ill) In case of a two-dimensional elliptical distribution, Condition (CQ is fulfilled if g{z) = 
as z —)• 0 for some 5 > 0 . 


The asymptotic distribution of pa,n only depends on p, but not on the elliptical generator 
g or any other characteristic of the population distribution. Therefore the two-stage spatial sign 
correlation is very well suited for nonparametric and robust correlation testing. Likewise to Fisher’s 
^-transformation for the moment correlation under normality (Fisher 1921; Hotelling, 1953), one 
can find a variance-stabilizing transformation for the spatial sign correlation under ellipticity. 
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Figure 1: Variance-stabilizing transformations (left) and their derivates (right) for the spatial sign correlation (solid) 
and the Pearson moment correlation, i.e., Fisher’s ^-transform (dashed). 


Corollary 2. Under the conditions of Theorem\A we have \Ai{^(Pcr,n) — h{p)'\ —> 1) with 


h{x) = s(x) f arcsin 


'3(1 - Vl - x2) - 2' 
Vi — +1 


+ 


TT 


where s(-) denotes the (in this case univariate) sign function. 

As can be seen in Figure the transformation h is similar to Fisher’s z-transform x 1—)■ 
log{(l + x)/{l — x)}/2. There are two main differences: first, h is flatter, with a smaller derivative 
throughout, reflecting the larger asymptotic variance of the spatial sign correlation under normal¬ 
ity, and second, h is bounded, attaining only values between and 7r/\/2. To construct 

confidence intervals, its inverse function h~^ : [—7r/-v/2, 7r/v/2] ^ [-1,1] is also of interest. It is 
given by 


h ^{y) = s{y) 


23/2^1 _ cos (V^y) 
3 — cos{V2y) 


Based on Corollary one can derive asymptotic level-a-tests for the generalized correlation co¬ 
efficient p of a bivariate elliptical distribution, which are robust and very accurate also in small 
samples, as the results of Section below indicate. For instance, a two-sided one-sample test for p 
based on p^^n would reject the null hypothesis p = po at the significance level a if the test statistic 


Tl,n — Xb(h(^Pfjp /i(po)} 


exceeds Xip-a^ i-®-) 1 — a quantile of the distribution with one degree of freedom. Likewise, 

for two samples of sizes ni and n2 and generalized correlation coefficients p^^^ and p^‘^\ respectively, 
the null hypothesis p^^^ = p*^^^ is rejected if 


T2.n — 


nin2 
ni + n2 


{Hpal) - HPal)y 
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is larger than Xi i-a^ where p^li, i = 1,2, denote the two-stage spatial sign correlations computed 
from the two samples. Similarly, one can construct one-sided and /c-sample tests. 


3. Simulations 


We want to numerically investigate the usefulness of the asymptotics in finite samples. We 
compute 95% conhdence intervals based on the spatial sign correlation with and without the trans¬ 
formation h, denoted in the tables below by sscor-/i and sscor, respectively. The simulations are 


done with the statistical software R ( 

R Development Core Team 

INJ 

o 

o 

We sample from bivariate 

elliptical distributions using the package mvtnorm ( 

Genz et ah 

2014 

). The central location is 

computed by the spatial median from the package pcaPP ( 
are estimated by the Qn implemented in the package robus 

Filzm 

tbase 

oser et al.H2011|), and the scales 
Rousseeuw et ah, 2014). 


Pearson’s moment correlation with and without Fisher’s ^-transform (denoted by coi-z and 
cor, respectively) serves as a benchmark. Under ellipticity, the asymptotic variance of the mo¬ 
ment correlation additionally depends on the kurtosis k. We estimate the latter by the following 
multivariate kurtosis estimator 


Kn — 


1 ^ 


p{p -I- 2) n ^ 


- Xnf±-\X^ - Xn)}^ - 3, 


2003 


where X^ denotes the sample mean and the sample covariance matrix (e.g. 
p. 103). Alternatively, one may estimate the kurtosis by averaging the componentwise marginal 


Anderson 


sample kurtoses, as it is done, e.g., in Vogel and Fried (2011). 


In Table covering frequencies of the generalized correlation coefficient p by the various con¬ 
hdence intervals are given based on 10,000 repetitions for each parameter setting. We consider 
the normal distribution and the t-distribution with 5 and 3 degrees of freedom, true correlations 
of p = 0 and p = 0.5 and six different sample sizes ranging from n = 10 to n = 10, 000. We 
see that the sscor-h conhdence intervals, i.e., the spatial-sign-based with transformation h, are al¬ 
most exact in all cases considered, already for n = 10. The spatial-sign-based conhdence intervals 
without transformation reach a comparable accuracy only for n = 50, and conhdence intervals 
based the Pearson correlation (with and without ^-transformation) no sooner than n = 100 at 
normality and n = 500 at the distribution. Table reports the corresponding average lengths 
of the conhdence intervals multiplied by ^/n. Comparing these average lengths for the Pearson 
correlation and spatial sign correlation, we rediscover roughly the square root of the ratio of the 
asymptotic variances, e.g., for the normal distribution at p = 0, we have 5.54/3.92 = 1.413 ~ ^/2. 
At normality, the conhdence intervals based on the Pearson correlation (the maximum likelihood 
estimator for p in this case) are shorter, whereas the sscor conhdence intervals are shorter at the ts 
distribution - at least in larger samples, where all conhdence intervals have the same 95% covering 
probability. Thus, in a heavy-tailed setting like the ts distribution, the spatial sign based conhdence 
intervals are superior - in terms of covering accuracy as well as length. Further, we observe that 
the strict asymptotic distribution-freeness of the spatial sign correlation practically also extends 
to the hnite-sample case. In both tables, the results for the spatial sign correlation are essentially 
the same for the three different elliptical distributions. In contrast, the Pearson correlation shows 
a considerably worse hnite-sample behavior at the ts than at the normal distribution. 

The fourth moments of the ts distribution are not hnite, i.e., the kurtosis does not exist, 
and the moment correlation is not y^-consistent when sampling from a distribution. Hence 
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p 

0 

0.5 

n 

sscor sscor-h cor cot-z 

sscor sscoT-h cor cor-z 


normal distribution 


10 

86 

94 

77 

83 

87 

93 

78 

83 

20 

90 

94 

86 

89 

91 

95 

87 

90 

50 

93 

95 

92 

93 

93 

95 

92 

93 

100 

93 

95 

93 

94 

94 

95 

93 

94 

500 

95 

95 

95 

95 

95 

95 

95 

95 

10000 

95 

95 

95 

95 

95 

95 

95 

95 

ts distribution 

10 

85 

94 

70 

76 

87 

93 

71 

77 

20 

90 

95 

81 

85 

90 

95 

80 

85 

50 

93 

95 

88 

90 

93 

95 

88 

90 

100 

94 

95 

91 

92 

94 

95 

91 

92 

500 

95 

95 

94 

94 

95 

95 

94 

94 

10000 

95 

95 

95 

95 

95 

95 

95 

95 


is distribution 


10 

85 

94 

64 

71 

87 

93 

66 

72 

20 

90 

94 

74 

79 

90 

94 

76 

81 

50 

93 

95 

82 

86 

93 

95 

82 

85 

100 

94 

95 

86 

88 

94 

95 

86 

88 

500 

95 

95 

90 

91 

94 

95 

90 

92 

10000 

95 

95 

94 

95 

95 

95 

94 

94 


Table 1: Empirical covering probabilities (%) of asymptotic 95% confidence intervals based on the spatial sign 
correlation (sscor) and the moment correlation (cor) with and without variance-stabilizing transformation for bivariate 
normal and t-distributions with 3 and 5 degrees of freedom, p — 0 and p = 0.5, and varying sample sizes n; 10,000 
repetitions. 


the usual construction of the moment correlation based conhdence intervals has no mathematical 
justification. However, the bottom parts of Tables and indicate that, when ignoring this 
fact, Pearson’s moment correlation nevertheless provides somewhat useful, approximate confidence 
intervals. While for small n the moment-correlation-based confidence intervals are short but have 
a too low coverage probability, they reach 95% in large samples, but are in comparison to, e.g., 
the sscor based confidence intervals very large. This somewhat unexpected observation is not 
completely surprising, since the length of the conhdence intervals is largely determined by the 
sample kurtosis. The slower convergence of the sample moment correlation to p, and the exploding 
behavior of the sample kurtosis are opposing effects, which appear to basically cancel each other. 

Altogether the spatial correlation with variance stabilizing transformation h yields very reliable 
conhdence bands, which are accurate also in very small samples. 
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p 


0 




0.5 


n 

sscor 

sscor-/i 

cor 

cor-z 

sscor 

sscor-/i 

cor 

cor- 2 : 

normal distribution 

10 

4.75 

4.11 

2.83 

2.67 

4.08 

3.69 

2.23 

2.17 

20 

5.11 

4.68 

3.35 

3.20 

4.20 

3.99 

2.57 

2.53 

50 

5.36 

5.15 

3.68 

3.60 

4.26 

4.18 

2.79 

2.77 

100 

5.45 

5.33 

3.80 

3.75 

4.30 

4.26 

2.87 

2.86 

500 

5.52 

5.50 

3.89 

3.88 

4.31 

4.30 

2.92 

2.92 

10000 

5.54 

5.54 

3.92 

3.92 

4.32 

4.32 

2.94 

2.94 

ts distribution 

10 

4.75 

4.12 

2.84 

2.68 

4.08 

3.69 

2.28 

2.21 

20 

5.11 

4.68 

3.63 

3.45 

4.22 

4.01 

2.83 

2.77 

50 

5.36 

5.15 

4.44 

4.30 

4.28 

4.20 

3.40 

3.36 

100 

5.45 

5.34 

4.92 

4.82 

4.30 

4.26 

3.74 

3.71 

500 

5.52 

5.50 

5.71 

5.67 

4.31 

4.31 

4.29 

4.28 

10000 

5.54 

5.54 

6.38 

6.38 

4.32 

4.31 

4.79 

4.79 


ts distribution 


10 

4.73 

4.10 

2.81 

2.65 

4.09 

3.70 

2.27 

2.21 

20 

5.11 

4.68 

3.77 

3.58 

4.22 

4.01 

2.99 

2.92 

50 

5.36 

5.15 

5.08 

4.87 

4.28 

4.20 

3.94 

3.87 

100 

5.45 

5.34 

6.15 

5.95 

4.30 

4.26 

4.72 

4.66 

500 

5.52 

5.50 

9.12 

8.96 

4.31 

4.31 

6.90 

6.85 

10000 

5.54 

5.54 

17.57 

17.46 

4.32 

4.32 

13.02 

12.99 


Table 2: Average lengths of 95% confidence intervals based on the spatial sign correlation (sscor) and the moment 
correlation (cor) with and without variance-stabilizing transformation for bivariate normal and t-distributions with 
3 and 5 degrees of freedom, p = 0 and p = 0.5, and varying sample sizes n; 10,000 repetitions. 


4. Conclusion 


The spatial sign correlation, as introduced in Diirre et al. (2015), cf. ([^, is a robust correlation 
estimator which has a variety of nice properties. It is fast to compute, it is distribution-free within 
the elliptical model, its efficiency is comparable to other estimators offering a similar degree of 
robustness, and the explicit form of the asymptotic variance facilitates inferential procedures. In 
this article we have addressed its main drawback: the inefficiency under strongly shaped models, 
i.e., where the eigenvalues of the shape matrix strongly differ. The shapedness due to different 
marginal scales may be eliminated by a componentwise standardization before computing the 
the spatial sign correlation. We have shown that the resulting two-step estimator has the same 
asymptotic distribution as the spatial sign correlation applied to a sample from a model with 
equal marginal scales. An important consequence is that the parameter a, cf. ([^, i.e., the ratio of 
the marginal scales, drops from the expression for the asymptotic variance. The only parameter 
left is the generalized correlation coefficient p itself. This allows to devise a variance-stabilizing 
transformation similar to Fisher’s ^-transformation, which, contrary to Fisher’s transform, is valid 
for all elliptical distributions. The prior standardization makes the spatial sign correlation really 


11 






































a practical estimator. 
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Appendix A. Proofs 

In the proof of Theorem we make use of the following lemma, which states that the empirical 
versions of Tj with and without location estimation are asymptotically equivalent. 

Lemma Al. Let t G and X be a p-variate random vector with distribution F satisfying 


(I) E\X -t\-‘^/^ < oo. 


Let further Xn = (^i,... ,Xn)'^ be an iid sample drawn from F and a series of p-variate random 


vectors satisfying 
(II) ^{tn -t) = Op{l). 


Finally, let A be a diagonal p x p matrix with positive diagonal entries. Then, for all I < j < p, 



converges to zero in probability as n ^ oo. 


Proof. To shorten notation and without loss of generality we will assume that t = 0 and A = Ip. 
We will show componentwise convergence, i.e. 



i=l 


(A.l) 


as n —)• oo for all 1 < j,k,l < p. We use the following random partition of 



(A.2) 


and the corresponding random partition of the index set {1,..., n}: 
In = {i < i < n\Xi E Bn}, In = {1 <i < n\Xi E B^}. 


Letting Ki denote the summands in (A.l), we write 



(A.3) 
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For the second sum on the right-hand side of (A.3) we make use of I ATj I < 2 to obtain In ^ V • jc KA < 
2n~^ Y^^=i The right-hand side of the last inequality is shown to converge to zero in prob¬ 

ability under the assumptions of Lemma Al^as in the proof of Theorem 1 in Diirre et al. (2014). 
The first sum on the right-hand side of (|A.3|) is decomposed into 


1 


n 




1 {(Xp) - - tn)^^\Xi - tnP\Xi\ 


Aj)\2 




+ -E 

71 


Aj)\2+ik)i 


iXi-UW* 


\Xi - tn\^\Xi\‘^ 

iGln i^In 


I Xi — tn r I Xr 


+ i V - 1^* - 

n ^ 




Call the four terms from left to right 71, Ti, 71, 71. Since Xi G Bn implies |Xj| < 2|Xj — t„|, we 
have 


misiE 

n 


- 2Xp))(Xi - tn)^'^Hx, - 


< 


:E 


tlPdiP - xf) 


\Xi - tr. 


\x- — f |4|tr-|4 

\^i ^n\ \^%\ 

+ ^E 






x* - tr, 


^ 2 ^ |tn^| 4 y-v |t 


(i)i 


n 




n 




Xi 


< - 
n 


6 A 


uuy 1 

Em = 6 Vn|t(f'E^E 


7=1 


2 = 1 


1 


0 , 


since the term in {•} converges to zero almost surely by Marczinkiewicz’s law of large numbers 
(Loeve, 1977, p. 255). Convergence to zero of the remaining terms 71, 71 and 71 is shown analo¬ 
gously. The proof of Lemma is complete. □ 


Remark: One can see from the last displayed line that, similarly to Theorem 1 of Diirre et al. 


( 2014[ ), the lemma can be proven also under slightly different conditions. For instance, assumption 
(If can weakened to tn t in exchange for the stronger moment condition X|X — t| ^<00. 

We are now ready to prove Theorem 


Proof of Theorem\^ Let tn(Xn) = A„Hn(X„A„)[j and write \/n{5n(X„A„, t„(-)) - 5n(XnA, At)} 


as 


E 


{AnX, - Anin{Xn)}{AnXi - A„t„(X„)}^ {AX^ " At„(X„)}{AW " At„(X„)}^ 


^ L{AnX, - Anin{^n)V{AnX, - A„t„(X„)} {AX^ " Ain{Xn)V{AXi - At„(X„)} 




{AXi - At„(X„)}{AXi - At,(X,)}^ {AXi - At}{AXi - At}'^ 


n E - AiniXn)V{AXi - At„(X„)} {AX* - Atj^jAX* - At} ’ 


technically, in is a function of X„ as well as An- We can understand 7n(Xn) as a short-hand notation, where the 
dependence on A„ is simply suppressed, but the notation is also justified in the sense that An usually is a function 
of X„. 
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Call the first term 71 and the second T 2 ■ The convergence of T 2 to zero in probability follo ws with 
Diirre et al. (2014, Theorem 2). Let Xi = AXi, Tn = Ain and r = At. Then Theorem 2 of Diirre 
al. (2014) essentially states that t„) — r)} ^4- 0, where = (Xi,..., . 

This is not stated explicitly in the text of the theorem, but this is what is proven. To check that 
the assumptions are met, note that by Conditions (CQ, (CQ and (C|^ we have 

'/n{Tn -t)= A^{A~^tn{'^nAn) “ t) 

= Ay/n{A~^tn{^nAn) - tn{^n)) + Ay/n{tn{^n) - t) 

= \/nAA.i^^{tn(^nAn) — Antni^n)) + A\/n{tn(^n) ~ t) = Op(l). 


The 


et al. 


atteris sufficient (along with continuity of F), cf. the remarks below Theorem 3 in Diirre 


([2014 ). We are thus left to prove 71 —4- Ep. Let Yi = Xi — in{'^n), where we suppress the 


dependence the on n in this short-hand notation. Then 71 can be further decomposed into 


Ti = ^ 
/n 


E 


AnViY^A., - AYYFA 




+ 


i=l 


Y^^A^Yi 

We call the terms 71,a and 71,b, where we have 

/ 1 44 ay y"^ a \ 

1-1 / -L ^ 1 A-l 


n 

/n ^ 
2=1 


{Y ^{A^-Al)Yi}AnYiY^Ar^ 
Y^AlYiX^A^Yi 


71,a — AnA 


n 


E 


^Yf^i)^ "^i^n-A) + ^{An-A)A 


1 ^ ayyFa\ 


which converges in distribution to 5(Fo,0)4 -|- ZA ^S'(Fo,0), since 


1 44 AYiYTA 


E 


S(F„,0) 


n ^ Y^A^Yi 

1=1 * * 

by Theorem 1 in Diirre et al. ( |2014 ). Writing 71,b as 71,b = C + TZ with 

1 A^2{Y^{A-An)A}YiAYiY^A 
,/tZ^ (yT^2y.)2 

{Y^{A - An)(A + An)Yi} AnYiY^An _ 2 {4^(4 - An)AY,} AYiX;^A \ 


2=1 
n 

fn ^ 
2=1 


1 


Y^ AlYiX^ A^Yi 

I 71 f' I t 


(YjAYY- 


1 


we 


find for C by using Lemma A[^ 

_ "P _ 1 AV-V'^ A 

£ = 2^{4-i4^(4-44}44_^{(yiy)(i)}2 ^ » * 




2^(4-iX)44r^.. 

i=i 


j=l i=l 

It remains to show that TZ vanishes asymptotically. We further decompose TZ into 


1 


E 


IYT{A-A„){A+A„)Y,} A„YiY/’A„ {Y^IA-A„){A+A„)Yi} A„YiY,^A,. 


+ 


2=1 

n r 

E 

2=1 


y;^ aiya;^ A^Yi 


Y^A^YAfA^Yi 

{Y;^{A-An){A+An)Yi] AnYiY^An {Y;r{A-An)2AA} AnYiY^^Ar. 


YFAAiY^AAi 


Y^AAYi^AAi 
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1 ^ 

+ 2 

/n ^ 

1 = 1 

n 


{Y;^{A-A^)AYi} AnYiY^An {Y^{A-An)AY,} AYiY^^A, 


+ 


E 2 

2=1 


Y^A^YiY^A^Yi Y^A^YiY^A^Yi 

{y;^{A-A n)AY,} AYAY^An {Y^^{A-An)AYi} AYCY^A 


yTa^yayTa^y^ 


Y^A^YiY^A^Y, 


and denote the four terms by 5i, 52 , ^3 and ^ 4 , respectively. For 5i we get 


|‘ 5 iI<Ee 

Jn ^ 


2=1 


{yf (A - An) {A + An)Yi}^AnYAY;^Ar, 


y^aiy,{yta^y,)^ 


< 


v/sltl 


fY;r{A-An){A + An)Y,y 


yFa^Y 


! 


-I P P 1 

V “ j=l k=l i=l \ i V 


j=i fc=i 

which converges to zero in probability. For ^ 2 , we obtain 


52 = 


1 


E 

2=1 


{Y^iA - An)iAn - A)Y} AnYY^Ar^ 


Y^A^Y^A^Yi 


Y, AaA-' - ^(OjF; 

. . \ tl . ^ 


AYY^A 




A-^Ar. 


i=i 

p 


i=l 


(Y^A^Yi)^ 


j=i 

where we have again used Lemma Similar calculations yield that 53 = op(l) and 54 = op(l) as 
n —)• 00 . Note that, although we have treated 71,a and £ individually, they converge in fact jointly. 
Both are essentially linear functions of y/n{An — A). The proof of Theoremis complete. □ 

Proof of Corollary^} As in Theorem]^ let Xq = A{X — t). Then Xq ~ Fq £ ^ 2 ( 0 , Fo)- Since Vq 
has equal diagonal elements, its eigenvalue decomposition is given by Fq = UAU^, where 


O' 


A = 


Ai 0 
0 A 2 


= c 


1 -p 0 

0 1 + p 


(A.4) 


for some c > 0. Hence, by Proposition 1 of Diirre et al. (2015), we have 
S(Fo, 0 ) = 


1/2 <5 

6 1/2 


with 5 = (1 — y^l — p^)/{2p) if p / 0 and (5 = 0 otherwise, and hence 

A-^ZS{Fq, 0 ) + 5(Fo, 0 )A-iZ = 


Z\la\ {Zi! ai + Z 2 / a2)(5A 

{Zi/ai + Z2/a2)6 -^ 2/02 / 


(A.5) 
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To compute the remaining part —2Y^j=i{Zj/aj)Tj, we have to evaluate the integrals Tj, j = 1,2. 


Towards this end, we write Xq = UA^^'^Y, where U and A are as in (A.4) and Y has a spherical 
distribution, and consider the matrix 


W = E 


vec 


f XoX^ \ _ f XoX;[ 




0 "^0 


= {U®U)E 


vec 


r Ai/2yyryYi/2 ] r Ai/2yyr^i/2 I ^ 


V 


y'^AT 


J vec 


Y^AY 


7 


{u®uf 


The expectation on the right-hand side is independent of the elliptical generator g and is given as 


an explicit function of Ai and A 2 in the proof of Proposition 2(3) in Diirre et al. (2015). Plugging 


in our specific forms of A and U, cf. (A.4), we obtain 


W = 


/ a 

/7 

/3 

7\ 

(3 

7 

7 

/7 

/3 

7 

7 

/3 

\7 

/7 

/? 

a/ 


with 


a = 


x/l-p2 + 2p2 _ 1 

4p2 




7 = 


1 - 

4p2 


if /? / 0, and a = 3/8, /3 = 0, 7 = 1/8 if p = 0. Since W contains Pi as upper diagonal block and 
r 2 as lower diagonal block, we obtain 


-2( —PiT—r2 ) =-2 

ai 02 



(A. 6 ) 


Putting (A.5) and (A. 6 ) together, we hnally arrive at 

0 


-2 — 


( Z \ la \ — ^2/02 


V 0 ^2/02 ~-Z^i/o-i^ 

which completes the proof of Corollary 

For the proof of Theorem we require a slight generalization of the delta method. 


□ 


Lemma A2. Let (?7n)neiN a series of p-dimensional random vectors and {an)ne]N o sequence of 
real numbers such that Or, —00 os n —)> 00 and 


(I) an{Un — u) = Op{l) os n —)■ 00 for some u G K.^. Let furthermore 

(II) h : RP —>• R 6 e continuously differentiable at u = (ui,... ,UpY' with = 0 for all i € I 
for some subset I C {1,... ,p}, and 

(III) an[Un — u]jc —^ 'k, where \Un — o]/c denotes the random vector obtained from Un — u by 
deleting all components in I. 
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Then an{h{Un) — h{u)) —^ 

If / = 0, Lemma A[^ boils down to the usual delta method. If some components of h'{u) are 
zero (which are gathered in the index set /), it suffices to ensure the joint convergence of the 
remaining components of an{Un — u) and the boundedness in probability of an{Un — u) to conclude 
the convergence of an{h{Un) — h{u)). 


Proof of Lemma The proof is similar to the proof of Lemma 5.3.2. in Bickel and Doksum (2001). 
Since h is continuously differentiable, for every e > 0 there exists a <5 > 0 such that 


\u — v\ < S 


\h{v) — h{u) — h'{u){v — u)\ < e\v — u\. 


(A.7) 


Condition (I) implies that Un u, i.e., P{\Un — u| < <5) —)• 1. Thus using (A.7), we have for 
every e > 0 that P{\h{Un) — h{u) — h'{u){Un — u)| < e\Un — '«!) —1 which implies an{h{Un) — 
h{u) — h'{u){Un — u)) = Op{\an{Un — n)|) = Op(l). The latter may be re-written as 

an{h{Un) - h{u)) = anh'{u){Un -u) + Op(l), 

and the result follows by Conditions (II) and (III) and Slutsky’s lemma. □ 


Proof of Theorem We write 

Vn{Pa,n - p) = y/n {'y{vec Sni^nAn, tn{-))} “ 7 {vec 5(Fo, 0 )}), 

where Fq is, as in Theoremj^ the distribution of Xq = A{X—t), and 7 : —?■ R is the function that 

maps the (vectorized) two-dimensional spatial sign covariance matrix of an elliptical distribution to 
the corresponding generalized correlation coefficient. The function 7 is given by ([^. Its derivative 
7 ' is computed in the proof of Proposition 5 in Diirre et al. (2015). Since Tq has equal marginal 
scales, i.e., a = 1 , we have 

7 '{vec 5 (Fo, 0 )} = (0 0 2^^! - p^{l + - p^) o) . 

We further decompose 

y/fiYec{Sn{XnAn,tn{-)) - S{Fo,0)) (A.8) 

= y/nYec{Sn{^nAn,tni-)) - Sn{'^nA,At)) + y/nvec {Sn(^nA, At) - S{Fo,0)) , 
where we call the two terms on the right hand side 71 and 72 . We deduce two things: First, 
y/nvec {Sn{^nAn,tni-)) - S{Fo,0)) = Op(l) as n oo, 

since 71 —^ H 2 by Theorem and 71 converges in distribution as a corollary of the central 
limit theorem (or as a special case of Proposition 2 in Diirre et al. (2015)). Second, the third 


component of (A. 8 ) converges i^distribution to the same limit as 71^^\ since converges to 


ity by Corollary Here we use (• )^^^ to denote the third component of a vector. 


zero in probabi 

The asymptotic distribution of 71 is given by Proposition 2 in Diirre et al. (|2015|). Making use of 

A^(0, w) 


(3) 


the particular structure of Vq) he., equal diagonal elements, cf. (A.4), we obtain Tf 
with w = (1 — -|- p^ — l)/(2p)^ if p 7 ^ 0 and w = 1/S \{ p = Q. Applying Lemma Aj^with 7 in 

the role of h, and 7^ = {3}, we obtain 

Vn{pa,n- p) = [7'{vec5(Fo,0)}](i,3) ■N{0,w) = iV(0, (1 - p^)^-h (1 - p^)^/^). 

Note that 7 ^(-) is a 1 x 4 matrix. The proof of Theorem]^ is complete. □ 
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Proof of Corollary^ By the delta method, the function h has to satisfy 
= {(1 - + (1 - 


(A.9) 


The function h given in Corollary fulfills this requirement and is further strictly increasing and 
odd. To find the antiderivative of ( |A.9[ ), we have used the compute algebra system Maxima (2014). 
Substituting z = 1 — 'Jx — yields the integral f — z)}~^dz, for which Maxima gives 

the primite 2“^/^ arcsin((32: — 2)/|z — 2|). □ 
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